Problem Statement

In this example I will build a classifier for churn prediction using a dataset from telecomm industry. You can find the data set in github in the following links.

https://github.com/abulbasar/data/tree/master/Churn%20prediction

There are two files

churn-bigml-80.csv training data
churn-bigml-20.csv test data



In [1]:

    
import xgboost as xgb
import pandas as pd
from sklearn import *
import matplotlib.pyplot as plt

%matplotlib inline

Load the training data



In [2]:

    
df_train = pd.read_csv("/data/churn-bigml-80.csv")
df_train.head()









    Out[2]:







  
    
      
      State
      Account length
      Area code
      International plan
      Voice mail plan
      Number vmail messages
      Total day minutes
      Total day calls
      Total day charge
      Total eve minutes
      Total eve calls
      Total eve charge
      Total night minutes
      Total night calls
      Total night charge
      Total intl minutes
      Total intl calls
      Total intl charge
      Customer service calls
      Churn
    
  
  
    
      0
      KS
      128
      415
      No
      Yes
      25
      265.1
      110
      45.07
      197.4
      99
      16.78
      244.7
      91
      11.01
      10.0
      3
      2.70
      1
      False
    
    
      1
      OH
      107
      415
      No
      Yes
      26
      161.6
      123
      27.47
      195.5
      103
      16.62
      254.4
      103
      11.45
      13.7
      3
      3.70
      1
      False
    
    
      2
      NJ
      137
      415
      No
      No
      0
      243.4
      114
      41.38
      121.2
      110
      10.30
      162.6
      104
      7.32
      12.2
      5
      3.29
      0
      False
    
    
      3
      OH
      84
      408
      Yes
      No
      0
      299.4
      71
      50.90
      61.9
      88
      5.26
      196.9
      89
      8.86
      6.6
      7
      1.78
      2
      False
    
    
      4
      OK
      75
      415
      Yes
      No
      0
      166.7
      113
      28.34
      148.3
      122
      12.61
      186.9
      121
      8.41
      10.1
      3
      2.73
      3
      False

Let's check number of records, number of columns, types of columns and whether the data contains NULL values.

As we see it contains 2665 records, 20 columns, and no null values. There are three catagorical values.



In [3]:

    
df_train.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2666 entries, 0 to 2665
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   State                   2666 non-null   object 
 1   Account length          2666 non-null   int64  
 2   Area code               2666 non-null   int64  
 3   International plan      2666 non-null   object 
 4   Voice mail plan         2666 non-null   object 
 5   Number vmail messages   2666 non-null   int64  
 6   Total day minutes       2666 non-null   float64
 7   Total day calls         2666 non-null   int64  
 8   Total day charge        2666 non-null   float64
 9   Total eve minutes       2666 non-null   float64
 10  Total eve calls         2666 non-null   int64  
 11  Total eve charge        2666 non-null   float64
 12  Total night minutes     2666 non-null   float64
 13  Total night calls       2666 non-null   int64  
 14  Total night charge      2666 non-null   float64
 15  Total intl minutes      2666 non-null   float64
 16  Total intl calls        2666 non-null   int64  
 17  Total intl charge       2666 non-null   float64
 18  Customer service calls  2666 non-null   int64  
 19  Churn                   2666 non-null   bool   
dtypes: bool(1), float64(8), int64(8), object(3)
memory usage: 398.5+ KB

Let's check distribution of the output class. As it shows it contains 85% records are negative. It gives a sense of desired accuracy - which is closure to 90% or more.



In [4]:

    
df_train.Churn.value_counts()









    Out[4]:





False    2278
True      388
Name: Churn, dtype: int64



In [5]:

    
df_train.Churn.value_counts()/len(df_train)









    Out[5]:





False    0.854464
True     0.145536
Name: Churn, dtype: float64



In [6]:

    
df_train.columns









    Out[6]:





Index(['State', 'Account length', 'Area code', 'International plan',
       'Voice mail plan', 'Number vmail messages', 'Total day minutes',
       'Total day calls', 'Total day charge', 'Total eve minutes',
       'Total eve calls', 'Total eve charge', 'Total night minutes',
       'Total night calls', 'Total night charge', 'Total intl minutes',
       'Total intl calls', 'Total intl charge', 'Customer service calls',
       'Churn'],
      dtype='object')

Loaded the test data and performed similar analysis as before.



In [7]:

    
df_test = pd.read_csv("/data/churn-bigml-20.csv")
df_test.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 667 entries, 0 to 666
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   State                   667 non-null    object 
 1   Account length          667 non-null    int64  
 2   Area code               667 non-null    int64  
 3   International plan      667 non-null    object 
 4   Voice mail plan         667 non-null    object 
 5   Number vmail messages   667 non-null    int64  
 6   Total day minutes       667 non-null    float64
 7   Total day calls         667 non-null    int64  
 8   Total day charge        667 non-null    float64
 9   Total eve minutes       667 non-null    float64
 10  Total eve calls         667 non-null    int64  
 11  Total eve charge        667 non-null    float64
 12  Total night minutes     667 non-null    float64
 13  Total night calls       667 non-null    int64  
 14  Total night charge      667 non-null    float64
 15  Total intl minutes      667 non-null    float64
 16  Total intl calls        667 non-null    int64  
 17  Total intl charge       667 non-null    float64
 18  Customer service calls  667 non-null    int64  
 19  Churn                   667 non-null    bool   
dtypes: bool(1), float64(8), int64(8), object(3)
memory usage: 99.8+ KB



In [8]:

    
df_test.Churn.value_counts()/len(df_test)









    Out[8]:





False    0.857571
True     0.142429
Name: Churn, dtype: float64



In [9]:

    
len(df_test)/len(df_train)









    Out[9]:





0.2501875468867217

Sort out of categorical and numeric columns so that it can be passed to pipeline for pre-proceessing steps. In the processing steps, we are doing the following

replace any missing numeric values with column median
perform standard scaling for numeric values
one hot encode the categorical columns

Althought the Area Code is numeric, here I am considering this as categorical since it is a qualitative variable in nature.



In [10]:

    
cat_columns = ['State', 'Area code', 'International plan', 'Voice mail plan']
num_columns = ['Account length', 'Number vmail messages', 'Total day minutes',
       'Total day calls', 'Total day charge', 'Total eve minutes',
       'Total eve calls', 'Total eve charge', 'Total night minutes',
       'Total night calls', 'Total night charge', 'Total intl minutes',
       'Total intl calls', 'Total intl charge', 'Customer service calls']



In [11]:

    
target = "Churn"
X_train = df_train.drop(columns=target)
y_train = df_train[target]
X_test = df_test.drop(columns=target)
y_test = df_test[target]



In [12]:

    
cat_pipe = pipeline.Pipeline([
    ('imputer', impute.SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', preprocessing.OneHotEncoder(handle_unknown='error', drop="first"))
]) 

num_pipe = pipeline.Pipeline([
    ('imputer', impute.SimpleImputer(strategy='median')),
    ('scaler', preprocessing.StandardScaler()),
])

preprocessing_pipe = compose.ColumnTransformer([
    ("cat", cat_pipe, cat_columns),
    ("num", num_pipe, num_columns)
])

X_train = preprocessing_pipe.fit_transform(X_train)
X_test = preprocessing_pipe.transform(X_test)



In [13]:

    
pd.DataFrame(X_train.toarray()).describe()









    Out[13]:







  
    
      
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
      ...
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
    
  
  
    
      count
      2666.000000
      2666.000000
      2666.000000
      2666.000000
      2666.000000
      2666.000000
      2666.000000
      2666.000000
      2666.000000
      2666.000000
      ...
      2.666000e+03
      2.666000e+03
      2.666000e+03
      2.666000e+03
      2.666000e+03
      2.666000e+03
      2.666000e+03
      2.666000e+03
      2.666000e+03
      2.666000e+03
    
    
      mean
      0.024756
      0.017629
      0.016879
      0.009002
      0.022131
      0.022131
      0.016879
      0.019130
      0.020255
      0.018380
      ...
      -1.179352e-16
      3.275699e-16
      2.915897e-16
      5.654392e-16
      -5.730183e-17
      -2.760149e-16
      -2.013893e-16
      -2.946297e-18
      5.267937e-16
      4.839007e-17
    
    
      std
      0.155410
      0.131625
      0.128843
      0.094470
      0.147136
      0.147136
      0.128843
      0.137007
      0.140898
      0.134345
      ...
      1.000188e+00
      1.000188e+00
      1.000188e+00
      1.000188e+00
      1.000188e+00
      1.000188e+00
      1.000188e+00
      1.000188e+00
      1.000188e+00
      1.000188e+00
    
    
      min
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      ...
      -3.933617e+00
      -4.962065e+00
      -3.933688e+00
      -3.101565e+00
      -3.456440e+00
      -3.100065e+00
      -3.672045e+00
      -1.819157e+00
      -3.672907e+00
      -1.191955e+00
    
    
      25%
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      ...
      -6.887477e-01
      -6.460883e-01
      -6.889229e-01
      -6.744811e-01
      -6.750593e-01
      -6.741347e-01
      -6.230740e-01
      -5.975267e-01
      -6.171222e-01
      -4.291724e-01
    
    
      50%
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      ...
      1.008679e-02
      -1.172304e-03
      1.083774e-02
      -3.730931e-04
      -5.467553e-03
      -1.177149e-03
      -1.327980e-02
      -1.903165e-01
      -1.925127e-02
      -4.291724e-01
    
    
      75%
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      ...
      6.814391e-01
      6.933526e-01
      6.805757e-01
      6.954009e-01
      6.641242e-01
      6.947594e-01
      6.682549e-01
      6.241038e-01
      6.716218e-01
      3.336100e-01
    
    
      max
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
      ...
      3.205881e+00
      3.471452e+00
      3.204795e+00
      3.817767e+00
      3.393998e+00
      3.815532e+00
      3.502005e+00
      6.325047e+00
      3.501544e+00
      5.673087e+00
    
  

8 rows × 69 columns

Build a basic logistic regression model and decision tree models and check the accuracy. Basic logistic regression model gives accuracy of 85%.



In [14]:

    
est = linear_model.LogisticRegression(solver="liblinear")
est.fit(X_train, y_train)
y_test_pred = est.predict(X_test)
est.score(X_test, y_test)









    Out[14]:





0.8545727136431784



In [15]:

    
est = tree.DecisionTreeClassifier(max_depth=6)
est.fit(X_train, y_train)
y_test_pred = est.predict(X_test)
est.score(X_test, y_test)









    Out[15]:





0.9535232383808095

Print classification report. The report shows that precision and recall score quite poor. Accuracy is 85%. Confusion metrics shows a high number of false positive and false negatives.



In [16]:

    
print(metrics.classification_report(y_test, y_test_pred))









    



              precision    recall  f1-score   support

       False       0.97      0.98      0.97       572
        True       0.87      0.79      0.83        95

    accuracy                           0.95       667
   macro avg       0.92      0.89      0.90       667
weighted avg       0.95      0.95      0.95       667



In [17]:

    
metrics.confusion_matrix(y_test, y_test_pred)









    Out[17]:





array([[561,  11],
       [ 20,  75]])

Next, we build a similar model using XGBoost. Performance the model is slightly better than logistic regression model.



In [43]:

    
eval_sets = [
    (X_train, y_train),
    (X_test, y_test)
]

cls = xgb.XGBRFClassifier(silent=False, 
                          scale_pos_weight=1,
                          learning_rate=0.1,  
                          colsample_bytree = 0.99,
                          subsample = 0.8,
                          objective='binary:logistic', 
                          n_estimators=100, 
                          reg_alpha = 0.003,
                          max_depth=10, 
                          gamma=10,
                          min_child_weight = 1
                          
                         )

print(cls.fit(X_train
              , y_train
              , eval_set = eval_sets
              , early_stopping_rounds = 10
              , eval_metric = ["error", "logloss"]
              , verbose = True
             ))
print("test accuracy: " , cls.score(X_test, y_test))









    



[0]	validation_0-error:0.044636	validation_0-logloss:1.56424	validation_1-error:0.043478	validation_1-logloss:1.30105
Multiple eval metrics have been passed: 'validation_1-logloss' will be used for early stopping.

Will train until validation_1-logloss hasn't improved in 10 rounds.
XGBRFClassifier(base_score=0.5, colsample_bylevel=1, colsample_bynode=0.8,
                colsample_bytree=0.99, gamma=10, learning_rate=100.0,
                max_delta_step=0, max_depth=10, min_child_weight=1,
                missing=None, n_estimators=100, n_jobs=1, nthread=None,
                objective='binary:logistic', random_state=0, reg_alpha=0.003,
                reg_lambda=1, scale_pos_weight=1, seed=None, silent=False,
                subsample=0.8, verbosity=1)
test accuracy:  0.9565217391304348



In [ ]:



In [19]:

    
cls.evals_result()









    Out[19]:





{'validation_0': {'error': [0.044636], 'logloss': [0.685241]},
 'validation_1': {'error': [0.043478], 'logloss': [0.68529]}}



In [20]:

    
y_test_pred = cls.predict(X_test)



In [21]:

    
metrics.confusion_matrix(y_test, y_test_pred)









    Out[21]:





array([[566,   6],
       [ 23,  72]])



In [22]:

    
y_test_prob = cls.predict_proba(X_test)[:, 1]
y_test_prob









    Out[22]:





array([0.49532136, 0.5033076 , 0.50276643, 0.4953032 , 0.4953032 ,
       0.49530497, 0.49530637, 0.4997355 , 0.49530464, 0.4953086 ,
       0.49531177, 0.495819  , 0.49530464, 0.49530464, 0.49530464,
       0.49532348, 0.49618483, 0.49530637, 0.4953032 , 0.496843  ,
       0.4953032 , 0.4953032 , 0.495819  , 0.4953032 , 0.49530497,
       0.49530464, 0.4953032 , 0.49530464, 0.4953106 , 0.4958201 ,
       0.49530464, 0.4953032 , 0.49530497, 0.49530464, 0.5033253 ,
       0.5011071 , 0.495819  , 0.495819  , 0.49534324, 0.49530464,
       0.49532136, 0.50361496, 0.4953032 , 0.49752688, 0.49531996,
       0.4953032 , 0.49743274, 0.5039511 , 0.4953032 , 0.49836493,
       0.4953032 , 0.49530464, 0.49530464, 0.49532348, 0.49530464,
       0.49530497, 0.5039283 , 0.4958608 , 0.5029972 , 0.49530464,
       0.50387526, 0.49532342, 0.50241953, 0.4953032 , 0.5013817 ,
       0.50372434, 0.49643707, 0.4953032 , 0.49580815, 0.49530464,
       0.49530464, 0.49530464, 0.4953092 , 0.4953032 , 0.5025921 ,
       0.4953032 , 0.4953032 , 0.49922177, 0.495831  , 0.49530464,
       0.4955463 , 0.4953032 , 0.4953032 , 0.4953092 , 0.49759394,
       0.49531597, 0.4953086 , 0.49532312, 0.4953032 , 0.4953032 ,
       0.4953032 , 0.4953032 , 0.49530637, 0.4953032 , 0.4960941 ,
       0.4953106 , 0.50403553, 0.49532342, 0.5001099 , 0.4953032 ,
       0.49532342, 0.49530464, 0.49580815, 0.4953032 , 0.50063103,
       0.49530464, 0.49530464, 0.495819  , 0.4953032 , 0.50381213,
       0.5024414 , 0.49577698, 0.4953032 , 0.49531937, 0.5024465 ,
       0.50251037, 0.4953032 , 0.4953032 , 0.4953032 , 0.4953032 ,
       0.4953032 , 0.49531072, 0.49580815, 0.50403553, 0.4955463 ,
       0.49532172, 0.49530464, 0.49530464, 0.4953032 , 0.4953032 ,
       0.49587768, 0.49530464, 0.5016732 , 0.49530464, 0.5036581 ,
       0.4968922 , 0.49532488, 0.49702328, 0.4953032 , 0.49530464,
       0.495819  , 0.49530464, 0.4953032 , 0.4953032 , 0.4953032 ,
       0.49552852, 0.496921  , 0.49530637, 0.4953032 , 0.49969548,
       0.49530464, 0.49531996, 0.4955463 , 0.49552852, 0.50340784,
       0.50382143, 0.4958201 , 0.49621817, 0.49530637, 0.4955463 ,
       0.4953032 , 0.49686763, 0.49531388, 0.50262976, 0.49532348,
       0.4955463 , 0.49580944, 0.4968922 , 0.49743342, 0.49532172,
       0.50403553, 0.49530464, 0.4953032 , 0.49530637, 0.49531996,
       0.5034803 , 0.49530464, 0.49690187, 0.4953032 , 0.4955463 ,
       0.4953106 , 0.49530464, 0.4953032 , 0.4955463 , 0.4978502 ,
       0.49530464, 0.49643707, 0.4953106 , 0.49530637, 0.4953283 ,
       0.4953032 , 0.4953032 , 0.49577698, 0.49531996, 0.4953032 ,
       0.49988273, 0.49530464, 0.4953032 , 0.49532652, 0.49530464,
       0.49530464, 0.49531096, 0.4953092 , 0.50354415, 0.49530464,
       0.49530464, 0.49532792, 0.4953032 , 0.4955463 , 0.49977228,
       0.4953032 , 0.4953032 , 0.4953032 , 0.4982413 , 0.49530637,
       0.4953032 , 0.49530464, 0.4981815 , 0.49978557, 0.49530464,
       0.49530464, 0.4953086 , 0.49530464, 0.4953032 , 0.4953032 ,
       0.50369656, 0.5036074 , 0.49530464, 0.4955463 , 0.4958687 ,
       0.49531096, 0.4966823 , 0.49530464, 0.4953032 , 0.49530637,
       0.4953092 , 0.4955463 , 0.4953032 , 0.503647  , 0.4953032 ,
       0.49530637, 0.4953032 , 0.49532482, 0.49533904, 0.4953032 ,
       0.4953032 , 0.4955463 , 0.4953086 , 0.4955463 , 0.49530637,
       0.49552852, 0.4953032 , 0.4999526 , 0.4953032 , 0.4953032 ,
       0.49530464, 0.4955463 , 0.49530464, 0.4953032 , 0.4974299 ,
       0.49532488, 0.49530464, 0.4953032 , 0.4953032 , 0.4968409 ,
       0.4953032 , 0.4953032 , 0.49532488, 0.4955463 , 0.49682808,
       0.49532342, 0.49530637, 0.49530464, 0.4953032 , 0.50050545,
       0.49531212, 0.5023832 , 0.49530464, 0.49531072, 0.5035246 ,
       0.4953092 , 0.49530464, 0.4953032 , 0.49531996, 0.49530464,
       0.495831  , 0.4953106 , 0.4993842 , 0.4953032 , 0.4958764 ,
       0.49572074, 0.4953092 , 0.4953032 , 0.49530464, 0.49531072,
       0.4962372 , 0.49530464, 0.4953092 , 0.50367874, 0.5025119 ,
       0.49535015, 0.4953032 , 0.49530464, 0.4953032 , 0.49531072,
       0.495819  , 0.4953032 , 0.49530464, 0.4953032 , 0.49530464,
       0.49531996, 0.4953032 , 0.4976884 , 0.49659276, 0.49552852,
       0.4953032 , 0.4953032 , 0.4953032 , 0.49530464, 0.49530497,
       0.49530497, 0.49531   , 0.49533406, 0.4953032 , 0.4953032 ,
       0.49530464, 0.4953032 , 0.5038736 , 0.4953032 , 0.49530464,
       0.49530464, 0.4953283 , 0.49531248, 0.4953032 , 0.4953032 ,
       0.49530464, 0.4953032 , 0.5017977 , 0.4953032 , 0.49530464,
       0.49696875, 0.4953032 , 0.5039511 , 0.4953032 , 0.49532342,
       0.49682102, 0.49530464, 0.4953032 , 0.49584877, 0.5038956 ,
       0.4953106 , 0.50390285, 0.49530464, 0.4953032 , 0.4960224 ,
       0.5038183 , 0.49531248, 0.4953032 , 0.49530464, 0.49532652,
       0.49530497, 0.4953032 , 0.50385237, 0.4967842 , 0.49627995,
       0.4953032 , 0.49530464, 0.4953032 , 0.4953032 , 0.4953092 ,
       0.4953032 , 0.49530637, 0.4953032 , 0.49580815, 0.49530464,
       0.4969823 , 0.49530464, 0.4958201 , 0.49530464, 0.49530464,
       0.4953032 , 0.49530637, 0.49531212, 0.49685526, 0.49580944,
       0.5036435 , 0.49530464, 0.4953092 , 0.49587113, 0.4953032 ,
       0.4953032 , 0.49659276, 0.49530464, 0.4953032 , 0.49579862,
       0.49530464, 0.49581063, 0.49530637, 0.5033076 , 0.49532342,
       0.49655408, 0.50392205, 0.4955463 , 0.5025287 , 0.49530464,
       0.4953032 , 0.49530464, 0.49532342, 0.4953092 , 0.50359726,
       0.4953092 , 0.4953654 , 0.49530464, 0.49530464, 0.4953032 ,
       0.4955463 , 0.495819  , 0.4953032 , 0.4997985 , 0.49530497,
       0.5035447 , 0.49530464, 0.495819  , 0.4965742 , 0.49701816,
       0.49696875, 0.4953032 , 0.49530464, 0.49580196, 0.49530464,
       0.5023832 , 0.4953032 , 0.4953092 , 0.495819  , 0.4955463 ,
       0.4953032 , 0.49577698, 0.49974996, 0.49530464, 0.49671346,
       0.4955463 , 0.49659276, 0.5038705 , 0.4953032 , 0.49643803,
       0.4953032 , 0.49530497, 0.4953032 , 0.49675822, 0.4963314 ,
       0.49531248, 0.4953032 , 0.4953032 , 0.49532136, 0.4953032 ,
       0.50403553, 0.49658242, 0.50356317, 0.49531996, 0.49531996,
       0.49653822, 0.50388294, 0.4953032 , 0.5007544 , 0.50050545,
       0.49530497, 0.49530464, 0.49530464, 0.49775854, 0.49530464,
       0.49530464, 0.49530497, 0.4953092 , 0.49673262, 0.4953032 ,
       0.49530464, 0.49530464, 0.5038602 , 0.49580815, 0.4953032 ,
       0.49530464, 0.49532342, 0.4953032 , 0.4999901 , 0.49610895,
       0.49532348, 0.4953032 , 0.4953032 , 0.4953032 , 0.4953032 ,
       0.4953092 , 0.49531996, 0.49887386, 0.4953032 , 0.4953032 ,
       0.5038602 , 0.4953032 , 0.49530464, 0.4953032 , 0.49530497,
       0.5004331 , 0.4953032 , 0.49531996, 0.4953032 , 0.4953252 ,
       0.5014622 , 0.49686527, 0.49531388, 0.4953032 , 0.49806842,
       0.49725476, 0.49530497, 0.49530464, 0.4953032 , 0.49659276,
       0.4953092 , 0.4953402 , 0.49531072, 0.4953032 , 0.49530464,
       0.49631235, 0.4953106 , 0.49530464, 0.4953106 , 0.502097  ,
       0.49712962, 0.49530464, 0.49530497, 0.49530637, 0.5006794 ,
       0.49530464, 0.49531212, 0.4953032 , 0.4953032 , 0.4953032 ,
       0.49530464, 0.4953032 , 0.4953032 , 0.49532652, 0.4953032 ,
       0.49591362, 0.4953032 , 0.49652252, 0.4955463 , 0.49531212,
       0.4953032 , 0.49825776, 0.4958687 , 0.4956332 , 0.50390285,
       0.49530637, 0.49530464, 0.5035893 , 0.49530497, 0.49580815,
       0.49530464, 0.4953032 , 0.4953032 , 0.495819  , 0.495819  ,
       0.4953032 , 0.50392735, 0.4953092 , 0.49531072, 0.4953032 ,
       0.49530497, 0.4953106 , 0.49532136, 0.49805152, 0.4953032 ,
       0.49530464, 0.4953032 , 0.5039283 , 0.50264704, 0.49531072,
       0.49531248, 0.4953032 , 0.4953086 , 0.49653822, 0.49738258,
       0.4953032 , 0.49531072, 0.5032668 , 0.49530464, 0.49530464,
       0.49531996, 0.4953032 , 0.4953032 , 0.49530464, 0.49530464,
       0.50403553, 0.4955463 , 0.4953032 , 0.5023832 , 0.49530497,
       0.49531096, 0.49530637, 0.49582142, 0.4953032 , 0.49530637,
       0.4953032 , 0.4955463 , 0.4953032 , 0.49530464, 0.4953032 ,
       0.4968609 , 0.5026335 , 0.49530464, 0.49532792, 0.4953032 ,
       0.4953032 , 0.49530464, 0.4953032 , 0.4953032 , 0.495819  ,
       0.50350094, 0.49530464, 0.4953032 , 0.500039  , 0.5001839 ,
       0.49532944, 0.49531   , 0.4953032 , 0.4953032 , 0.49754012,
       0.49572074, 0.4953032 , 0.49530637, 0.4953032 , 0.495819  ,
       0.49535576, 0.4953032 , 0.49537733, 0.4953032 , 0.49619558,
       0.49892336, 0.49698773, 0.495819  , 0.4953092 , 0.49530497,
       0.4953032 , 0.49658677, 0.49530464, 0.49530464, 0.49530464,
       0.49580815, 0.4953032 , 0.4953032 , 0.49580815, 0.4953032 ,
       0.495819  , 0.49531072, 0.5039283 , 0.50389624, 0.4953032 ,
       0.495819  , 0.49926472, 0.49530497, 0.4953032 , 0.4953032 ,
       0.49530464, 0.5033546 , 0.4953032 , 0.4953032 , 0.49530637,
       0.4953032 , 0.4955463 ], dtype=float32)



In [23]:

    
auc = metrics.roc_auc_score(y_test, y_test_prob)
auc









    Out[23]:





0.9264722119985278



In [24]:

    
ftr, tpr, thresholds = metrics.roc_curve(y_test, y_test_prob)



In [25]:

    
plt.rcParams['figure.figsize'] = 8,8
plt.plot(ftr, tpr)
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.title("ROC, auc: " + str(auc))









    Out[25]:





Text(0.5, 1.0, 'ROC, auc: 0.9264722119985278')

Cross validate the model

XGBoost cross validation parameters

num_boost_round: denotes the number of trees you build (analogous to n_estimators)
metrics: tells the evaluation metrics to be watched during CV
as_pandas: to return the results in a pandas DataFrame.
early_stopping_rounds: finishes training of the model early if the hold-out metric ("rmse" in our case) does not improve for a given number of rounds.
seed: for reproducibility of results.



In [26]:

    
params = {  'objective': "binary:logistic"
          , 'colsample_bytree': 0.9
          , 'learning_rate': 0.01
          , 'max_depth': 10
          , 'alpha': 0.5
          , 'min_child_weight': 1
          , 'subsample': 1
          , 'eval_metric': "auc"
          , 'n_estimators': 300
          , 'verbose': True
         }

data_dmatrix = xgb.DMatrix(data=X_train,label=y_train) 

cv_results = xgb.cv(dtrain=data_dmatrix
                    , params=params
                    , nfold=5
                    , maximize = "auc"
                    , num_boost_round=100
                    , early_stopping_rounds=10
                    , metrics=["logloss", "error", "auc"]
                    , as_pandas=True
                    , seed=123
                    , verbose_eval=True
                   )

cv_results









    



[0]	train-auc:0.921533+0.00807297	train-error:0.0365722+0.00345008	train-logloss:0.684847+0.0001037	test-auc:0.899072+0.0266443	test-error:0.0615158+0.0153549	test-logloss:0.685205+0.000253096
[1]	train-auc:0.918103+0.00770359	train-error:0.035822+0.00343167	train-logloss:0.676853+0.00028859	test-auc:0.897627+0.0269152	test-error:0.0615166+0.0142642	test-logloss:0.677571+0.000453148
[2]	train-auc:0.923362+0.00558459	train-error:0.035634+0.00270392	train-logloss:0.669226+0.000352606	test-auc:0.896829+0.0280443	test-error:0.0637682+0.0175672	test-logloss:0.670414+0.000763792
[3]	train-auc:0.923363+0.00572027	train-error:0.034884+0.00273107	train-logloss:0.661607+0.000592359	test-auc:0.896745+0.0277124	test-error:0.0626424+0.0184112	test-logloss:0.663192+0.000821418
[4]	train-auc:0.923499+0.00549045	train-error:0.0349778+0.0026816	train-logloss:0.654104+0.000705886	test-auc:0.897508+0.0281374	test-error:0.0615174+0.0182119	test-logloss:0.656008+0.00103398
[5]	train-auc:0.923711+0.00533214	train-error:0.0351656+0.00338492	train-logloss:0.646568+0.000766784	test-auc:0.898581+0.0283717	test-error:0.0615172+0.0177019	test-logloss:0.64877+0.00129266
[6]	train-auc:0.924853+0.00638966	train-error:0.0348842+0.00361882	train-logloss:0.639399+0.000742947	test-auc:0.897607+0.0282272	test-error:0.0611426+0.0178303	test-logloss:0.641937+0.00148973
[7]	train-auc:0.92459+0.00652497	train-error:0.0346966+0.00369596	train-logloss:0.632366+0.00098383	test-auc:0.897484+0.0279144	test-error:0.0607674+0.0176712	test-logloss:0.635199+0.00178034
[8]	train-auc:0.924464+0.00621409	train-error:0.0345092+0.00339353	train-logloss:0.625226+0.00103183	test-auc:0.898053+0.0282137	test-error:0.0607674+0.0176712	test-logloss:0.628376+0.0020224
[9]	train-auc:0.926523+0.00675586	train-error:0.0342276+0.00318398	train-logloss:0.618486+0.00101266	test-auc:0.897991+0.0283456	test-error:0.0603916+0.0172168	test-logloss:0.621976+0.00251335






    Out[26]:







  
    
      
      train-auc-mean
      train-auc-std
      train-error-mean
      train-error-std
      train-logloss-mean
      train-logloss-std
      test-auc-mean
      test-auc-std
      test-error-mean
      test-error-std
      test-logloss-mean
      test-logloss-std
    
  
  
    
      0
      0.921533
      0.008073
      0.036572
      0.00345
      0.684847
      0.000104
      0.899072
      0.026644
      0.061516
      0.015355
      0.685205
      0.000253



In [27]:

    
cv_results[["train-error-mean"]].plot()









    Out[27]:





<matplotlib.axes._subplots.AxesSubplot at 0x1113f25d0>

Install graphviz to display the decision graph

$ conda install graphviz python-graphviz



In [28]:

    
plt.rcParams['figure.figsize'] = 50,50

xgb.plot_tree(cls, num_trees=0, rankdir='LR')









    Out[28]:





<matplotlib.axes._subplots.AxesSubplot at 0x1a286b2cd0>

These plots provide insight into how the model arrived at its final decisions and what splits it made to arrive at those decisions.

Note that if the above plot throws the 'graphviz' error on your system, consider installing the graphviz package via pip install graphviz on cmd. You may also need to run sudo apt-get install graphviz on cmd.



In [29]:

    
plt.rcParams['figure.figsize'] =15, 15
xgb.plot_importance(cls, )









    Out[29]:





<matplotlib.axes._subplots.AxesSubplot at 0x1a28afc750>



In [30]:

    
cls.feature_importances_









    Out[30]:





array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.01025054, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.00170039,
       0.00204219, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.10118318, 0.04199324, 0.00574934,
       0.03693115, 0.07571935, 0.00976875, 0.07445166, 0.03749925,
       0.        , 0.03856089, 0.02044625, 0.01984736, 0.02129499,
       0.10545638, 0.10491905, 0.10734642, 0.18483971], dtype=float32)



In [31]:

    
one_hot_encoder = preprocessing_pipe.transformers_[0][1].steps[1][1]
one_hot_encoder









    Out[31]:





OneHotEncoder(categories='auto', drop='first', dtype=<class 'numpy.float64'>,
              handle_unknown='error', sparse=True)



In [32]:

    
one_hot_encoder.get_feature_names()









    Out[32]:





array(['x0_AL', 'x0_AR', 'x0_AZ', 'x0_CA', 'x0_CO', 'x0_CT', 'x0_DC',
       'x0_DE', 'x0_FL', 'x0_GA', 'x0_HI', 'x0_IA', 'x0_ID', 'x0_IL',
       'x0_IN', 'x0_KS', 'x0_KY', 'x0_LA', 'x0_MA', 'x0_MD', 'x0_ME',
       'x0_MI', 'x0_MN', 'x0_MO', 'x0_MS', 'x0_MT', 'x0_NC', 'x0_ND',
       'x0_NE', 'x0_NH', 'x0_NJ', 'x0_NM', 'x0_NV', 'x0_NY', 'x0_OH',
       'x0_OK', 'x0_OR', 'x0_PA', 'x0_RI', 'x0_SC', 'x0_SD', 'x0_TN',
       'x0_TX', 'x0_UT', 'x0_VA', 'x0_VT', 'x0_WA', 'x0_WI', 'x0_WV',
       'x0_WY', 'x1_415', 'x1_510', 'x2_Yes', 'x3_Yes'], dtype=object)



In [33]:

    
preprocessing_pipe.transformers_[0][1]









    Out[33]:





Pipeline(memory=None,
         steps=[('imputer',
                 SimpleImputer(add_indicator=False, copy=True,
                               fill_value='missing', missing_values=nan,
                               strategy='constant', verbose=0)),
                ('onehot',
                 OneHotEncoder(categories='auto', drop='first',
                               dtype=<class 'numpy.float64'>,
                               handle_unknown='error', sparse=True))],
         verbose=False)



In [34]:

    
parameters = {
    'max_depth': range (2, 10, 1),
    'n_estimators': range(60, 220, 40),
    'learning_rate': [0.1, 0.01, 0.05]
}


cls = xgb.XGBRFClassifier(silent=False, 
                          scale_pos_weight=1,
                          learning_rate=0.01,  
                          colsample_bytree = 0.99,
                          subsample = 0.8,
                          objective='binary:logistic', 
                          n_estimators=100, 
                          reg_alpha = 0.003,
                          max_depth=10, 
                          gamma=10,
                          min_child_weight = 1
                         )

grid_search = model_selection.GridSearchCV(
    estimator=cls,
    param_grid=parameters,
    scoring = 'roc_auc',
    n_jobs = 12,
    cv = 10,
    verbose=True,
    return_train_score=True
)

grid_search.fit(X_train, y_train)









    



Fitting 10 folds for each of 96 candidates, totalling 960 fits






    



[Parallel(n_jobs=12)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=12)]: Done  26 tasks      | elapsed:    2.2s
[Parallel(n_jobs=12)]: Done 176 tasks      | elapsed:    9.3s
[Parallel(n_jobs=12)]: Done 426 tasks      | elapsed:   26.9s
[Parallel(n_jobs=12)]: Done 776 tasks      | elapsed:   50.4s
[Parallel(n_jobs=12)]: Done 960 out of 960 | elapsed:  1.1min finished






    Out[34]:





GridSearchCV(cv=10, error_score=nan,
             estimator=XGBRFClassifier(base_score=0.5, colsample_bylevel=1,
                                       colsample_bynode=0.8,
                                       colsample_bytree=0.99, gamma=10,
                                       learning_rate=0.01, max_delta_step=0,
                                       max_depth=10, min_child_weight=1,
                                       missing=None, n_estimators=100, n_jobs=1,
                                       nthread=None,
                                       objective='binary:logistic',
                                       random_state=0, reg_alpha=0.003,
                                       reg_lambda=1, scale_pos_weight=1,
                                       seed=None, silent=False, subsample=0.8,
                                       verbosity=1),
             iid='deprecated', n_jobs=12,
             param_grid={'learning_rate': [0.1, 0.01, 0.05],
                         'max_depth': range(2, 10),
                         'n_estimators': range(60, 220, 40)},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
             scoring='roc_auc', verbose=True)



In [35]:

    
grid_search.best_estimator_









    Out[35]:





XGBRFClassifier(base_score=0.5, colsample_bylevel=1, colsample_bynode=0.8,
                colsample_bytree=0.99, gamma=10, learning_rate=0.01,
                max_delta_step=0, max_depth=7, min_child_weight=1, missing=None,
                n_estimators=60, n_jobs=1, nthread=None,
                objective='binary:logistic', random_state=0, reg_alpha=0.003,
                reg_lambda=1, scale_pos_weight=1, seed=None, silent=False,
                subsample=0.8, verbosity=1)



In [36]:

    
grid_search.best_params_









    Out[36]:





{'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 60}



In [37]:

    
grid_search.best_score_









    Out[37]:





0.9102904111517148



In [38]:

    
pd.DataFrame(grid_search.cv_results_)









    Out[38]:







  
    
      
      mean_fit_time
      std_fit_time
      mean_score_time
      std_score_time
      param_learning_rate
      param_max_depth
      param_n_estimators
      params
      split0_test_score
      split1_test_score
      ...
      split2_train_score
      split3_train_score
      split4_train_score
      split5_train_score
      split6_train_score
      split7_train_score
      split8_train_score
      split9_train_score
      mean_train_score
      std_train_score
    
  
  
    
      0
      0.149544
      0.008753
      0.005663
      0.001029
      0.1
      2
      60
      {'learning_rate': 0.1, 'max_depth': 2, 'n_esti...
      0.841768
      0.869040
      ...
      0.876419
      0.878930
      0.868219
      0.870282
      0.873559
      0.875427
      0.878467
      0.866327
      0.873710
      0.004520
    
    
      1
      0.206796
      0.004109
      0.005612
      0.000706
      0.1
      2
      100
      {'learning_rate': 0.1, 'max_depth': 2, 'n_esti...
      0.840643
      0.869152
      ...
      0.878967
      0.880101
      0.869855
      0.870329
      0.874219
      0.873556
      0.877827
      0.863693
      0.873758
      0.004891
    
    
      2
      0.274844
      0.005021
      0.005403
      0.000287
      0.1
      2
      140
      {'learning_rate': 0.1, 'max_depth': 2, 'n_esti...
      0.839238
      0.870052
      ...
      0.880513
      0.880447
      0.870322
      0.872164
      0.875337
      0.873125
      0.877370
      0.869071
      0.874927
      0.003851
    
    
      3
      0.348550
      0.004721
      0.005465
      0.000429
      0.1
      2
      180
      {'learning_rate': 0.1, 'max_depth': 2, 'n_esti...
      0.841206
      0.869040
      ...
      0.882023
      0.883071
      0.870990
      0.870443
      0.874660
      0.872436
      0.877211
      0.868695
      0.875305
      0.004721
    
    
      4
      0.190961
      0.005856
      0.005865
      0.001291
      0.1
      3
      60
      {'learning_rate': 0.1, 'max_depth': 3, 'n_esti...
      0.850990
      0.902215
      ...
      0.898055
      0.898527
      0.892299
      0.889783
      0.897592
      0.894907
      0.899457
      0.886795
      0.894677
      0.004167
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      91
      1.731979
      0.043035
      0.009813
      0.002623
      0.05
      8
      180
      {'learning_rate': 0.05, 'max_depth': 8, 'n_est...
      0.888270
      0.941464
      ...
      0.933009
      0.934481
      0.924315
      0.920438
      0.923470
      0.922971
      0.931564
      0.918574
      0.925977
      0.005074
    
    
      92
      0.649170
      0.017030
      0.008285
      0.002033
      0.05
      9
      60
      {'learning_rate': 0.05, 'max_depth': 9, 'n_est...
      0.889170
      0.947650
      ...
      0.928464
      0.930433
      0.920389
      0.920887
      0.921828
      0.920205
      0.929431
      0.914081
      0.923693
      0.004857
    
    
      93
      1.041405
      0.014829
      0.008429
      0.001413
      0.05
      9
      100
      {'learning_rate': 0.05, 'max_depth': 9, 'n_est...
      0.887314
      0.943432
      ...
      0.931600
      0.933729
      0.922440
      0.920672
      0.922962
      0.922058
      0.930330
      0.917792
      0.925106
      0.004886
    
    
      94
      1.433208
      0.040642
      0.009050
      0.001179
      0.05
      9
      140
      {'learning_rate': 0.05, 'max_depth': 9, 'n_est...
      0.886865
      0.941970
      ...
      0.933385
      0.935089
      0.924040
      0.920403
      0.923843
      0.922945
      0.931817
      0.917976
      0.925825
      0.005390
    
    
      95
      1.509487
      0.142202
      0.006411
      0.001713
      0.05
      9
      180
      {'learning_rate': 0.05, 'max_depth': 9, 'n_est...
      0.888270
      0.941464
      ...
      0.933020
      0.934491
      0.924304
      0.920444
      0.923470
      0.922971
      0.931564
      0.918574
      0.925982
      0.005076
    
  

96 rows × 33 columns



In [39]:

    
folds = 5
param_comb = 5

cls = xgb.XGBRFClassifier(silent=False, 
                          scale_pos_weight=1,
                          learning_rate=0.01,  
                          colsample_bytree = 0.99,
                          subsample = 0.8,
                          objective='binary:logistic', 
                          n_estimators=100, 
                          reg_alpha = 0.003,
                          max_depth=10, 
                          gamma=10,
                          min_child_weight = 1
                         )

skf = model_selection.StratifiedKFold(n_splits=folds, shuffle = True, random_state = 1001)
random_search = model_selection.RandomizedSearchCV(cls, 
                                   param_distributions=parameters, 
                                   n_iter=param_comb, 
                                   scoring='accuracy', 
                                   n_jobs=12, 
                                   cv=skf.split(X_train,y_train), 
                                   verbose=3, 
                                   random_state=1001 )

random_search.fit(X_train, y_train)









    



Fitting 5 folds for each of 5 candidates, totalling 25 fits






    



[Parallel(n_jobs=12)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=12)]: Done  11 out of  25 | elapsed:    1.4s remaining:    1.8s
[Parallel(n_jobs=12)]: Done  20 out of  25 | elapsed:    1.7s remaining:    0.4s
[Parallel(n_jobs=12)]: Done  25 out of  25 | elapsed:    1.8s finished






    Out[39]:





RandomizedSearchCV(cv=<generator object _BaseKFold.split at 0x1a28595e50>,
                   error_score=nan,
                   estimator=XGBRFClassifier(base_score=0.5,
                                             colsample_bylevel=1,
                                             colsample_bynode=0.8,
                                             colsample_bytree=0.99, gamma=10,
                                             learning_rate=0.01,
                                             max_delta_step=0, max_depth=10,
                                             min_child_weight=1, missing=None,
                                             n_estimators=100, n_jobs=1,
                                             nthread=None,
                                             objective='binary:logistic',
                                             ran...eg_alpha=0.003,
                                             reg_lambda=1, scale_pos_weight=1,
                                             seed=None, silent=False,
                                             subsample=0.8, verbosity=1),
                   iid='deprecated', n_iter=5, n_jobs=12,
                   param_distributions={'learning_rate': [0.1, 0.01, 0.05],
                                        'max_depth': range(2, 10),
                                        'n_estimators': range(60, 220, 40)},
                   pre_dispatch='2*n_jobs', random_state=1001, refit=True,
                   return_train_score=False, scoring='accuracy', verbose=3)



In [40]:

    
random_search.best_score_, random_search.best_params_









    Out[40]:





(0.9373604289197603,
 {'n_estimators': 60, 'max_depth': 7, 'learning_rate': 0.1})



In [41]:

    
pd.DataFrame(random_search.cv_results_)









    Out[41]:







  
    
      
      mean_fit_time
      std_fit_time
      mean_score_time
      std_score_time
      param_n_estimators
      param_max_depth
      param_learning_rate
      params
      split0_test_score
      split1_test_score
      split2_test_score
      split3_test_score
      split4_test_score
      mean_test_score
      std_test_score
      rank_test_score
    
  
  
    
      0
      1.409287
      0.043595
      0.013272
      0.001676
      180
      8
      0.01
      {'n_estimators': 180, 'max_depth': 8, 'learnin...
      0.928839
      0.930582
      0.953096
      0.938086
      0.934334
      0.936987
      0.008662
      2
    
    
      1
      0.398917
      0.004048
      0.009777
      0.000498
      180
      2
      0.1
      {'n_estimators': 180, 'max_depth': 2, 'learnin...
      0.850187
      0.893058
      0.893058
      0.874296
      0.857411
      0.873602
      0.017709
      5
    
    
      2
      0.969388
      0.009220
      0.013126
      0.000999
      180
      5
      0.1
      {'n_estimators': 180, 'max_depth': 5, 'learnin...
      0.925094
      0.926829
      0.953096
      0.938086
      0.934334
      0.935488
      0.010011
      3
    
    
      3
      0.571424
      0.021647
      0.008289
      0.001412
      140
      4
      0.05
      {'n_estimators': 140, 'max_depth': 4, 'learnin...
      0.900749
      0.926829
      0.917448
      0.938086
      0.923077
      0.921238
      0.012269
      4
    
    
      4
      0.355145
      0.048288
      0.005679
      0.001454
      60
      7
      0.1
      {'n_estimators': 60, 'max_depth': 7, 'learning...
      0.934457
      0.930582
      0.947467
      0.936210
      0.938086
      0.937360
      0.005628
      1



In [ ]:

	State	Account length	Area code	International plan	Voice mail plan	Number vmail messages	Total day minutes	Total day calls	Total day charge	Total eve minutes	Total eve calls	Total eve charge	Total night minutes	Total night calls	Total night charge	Total intl minutes	Total intl calls	Total intl charge	Customer service calls	Churn
0	KS	128	415	No	Yes	25	265.1	110	45.07	197.4	99	16.78	244.7	91	11.01	10.0	3	2.70	1	False
1	OH	107	415	No	Yes	26	161.6	123	27.47	195.5	103	16.62	254.4	103	11.45	13.7	3	3.70	1	False
2	NJ	137	415	No	No	0	243.4	114	41.38	121.2	110	10.30	162.6	104	7.32	12.2	5	3.29	0	False
3	OH	84	408	Yes	No	0	299.4	71	50.90	61.9	88	5.26	196.9	89	8.86	6.6	7	1.78	2	False
4	OK	75	415	Yes	No	0	166.7	113	28.34	148.3	122	12.61	186.9	121	8.41	10.1	3	2.73	3	False

	0	1	2	3	4	5	6	7	8	9	...	59	60	61	62	63	64	65	66	67	68
count	2666.000000	2666.000000	2666.000000	2666.000000	2666.000000	2666.000000	2666.000000	2666.000000	2666.000000	2666.000000	...	2.666000e+03	2.666000e+03	2.666000e+03	2.666000e+03	2.666000e+03	2.666000e+03	2.666000e+03	2.666000e+03	2.666000e+03	2.666000e+03
mean	0.024756	0.017629	0.016879	0.009002	0.022131	0.022131	0.016879	0.019130	0.020255	0.018380	...	-1.179352e-16	3.275699e-16	2.915897e-16	5.654392e-16	-5.730183e-17	-2.760149e-16	-2.013893e-16	-2.946297e-18	5.267937e-16	4.839007e-17
std	0.155410	0.131625	0.128843	0.094470	0.147136	0.147136	0.128843	0.137007	0.140898	0.134345	...	1.000188e+00	1.000188e+00	1.000188e+00	1.000188e+00	1.000188e+00	1.000188e+00	1.000188e+00	1.000188e+00	1.000188e+00	1.000188e+00
min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	...	-3.933617e+00	-4.962065e+00	-3.933688e+00	-3.101565e+00	-3.456440e+00	-3.100065e+00	-3.672045e+00	-1.819157e+00	-3.672907e+00	-1.191955e+00
25%	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	...	-6.887477e-01	-6.460883e-01	-6.889229e-01	-6.744811e-01	-6.750593e-01	-6.741347e-01	-6.230740e-01	-5.975267e-01	-6.171222e-01	-4.291724e-01
50%	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	...	1.008679e-02	-1.172304e-03	1.083774e-02	-3.730931e-04	-5.467553e-03	-1.177149e-03	-1.327980e-02	-1.903165e-01	-1.925127e-02	-4.291724e-01
75%	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	...	6.814391e-01	6.933526e-01	6.805757e-01	6.954009e-01	6.641242e-01	6.947594e-01	6.682549e-01	6.241038e-01	6.716218e-01	3.336100e-01
max	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	3.205881e+00	3.471452e+00	3.204795e+00	3.817767e+00	3.393998e+00	3.815532e+00	3.502005e+00	6.325047e+00	3.501544e+00	5.673087e+00

	mean_fit_time	std_fit_time	mean_score_time	std_score_time	param_learning_rate	param_max_depth	param_n_estimators	params	split0_test_score	split1_test_score	...	split2_train_score	split3_train_score	split4_train_score	split5_train_score	split6_train_score	split7_train_score	split8_train_score	split9_train_score	mean_train_score	std_train_score
0	0.149544	0.008753	0.005663	0.001029	0.1	2	60	{'learning_rate': 0.1, 'max_depth': 2, 'n_esti...	0.841768	0.869040	...	0.876419	0.878930	0.868219	0.870282	0.873559	0.875427	0.878467	0.866327	0.873710	0.004520
1	0.206796	0.004109	0.005612	0.000706	0.1	2	100	{'learning_rate': 0.1, 'max_depth': 2, 'n_esti...	0.840643	0.869152	...	0.878967	0.880101	0.869855	0.870329	0.874219	0.873556	0.877827	0.863693	0.873758	0.004891
2	0.274844	0.005021	0.005403	0.000287	0.1	2	140	{'learning_rate': 0.1, 'max_depth': 2, 'n_esti...	0.839238	0.870052	...	0.880513	0.880447	0.870322	0.872164	0.875337	0.873125	0.877370	0.869071	0.874927	0.003851
3	0.348550	0.004721	0.005465	0.000429	0.1	2	180	{'learning_rate': 0.1, 'max_depth': 2, 'n_esti...	0.841206	0.869040	...	0.882023	0.883071	0.870990	0.870443	0.874660	0.872436	0.877211	0.868695	0.875305	0.004721
4	0.190961	0.005856	0.005865	0.001291	0.1	3	60	{'learning_rate': 0.1, 'max_depth': 3, 'n_esti...	0.850990	0.902215	...	0.898055	0.898527	0.892299	0.889783	0.897592	0.894907	0.899457	0.886795	0.894677	0.004167
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
91	1.731979	0.043035	0.009813	0.002623	0.05	8	180	{'learning_rate': 0.05, 'max_depth': 8, 'n_est...	0.888270	0.941464	...	0.933009	0.934481	0.924315	0.920438	0.923470	0.922971	0.931564	0.918574	0.925977	0.005074
92	0.649170	0.017030	0.008285	0.002033	0.05	9	60	{'learning_rate': 0.05, 'max_depth': 9, 'n_est...	0.889170	0.947650	...	0.928464	0.930433	0.920389	0.920887	0.921828	0.920205	0.929431	0.914081	0.923693	0.004857
93	1.041405	0.014829	0.008429	0.001413	0.05	9	100	{'learning_rate': 0.05, 'max_depth': 9, 'n_est...	0.887314	0.943432	...	0.931600	0.933729	0.922440	0.920672	0.922962	0.922058	0.930330	0.917792	0.925106	0.004886
94	1.433208	0.040642	0.009050	0.001179	0.05	9	140	{'learning_rate': 0.05, 'max_depth': 9, 'n_est...	0.886865	0.941970	...	0.933385	0.935089	0.924040	0.920403	0.923843	0.922945	0.931817	0.917976	0.925825	0.005390
95	1.509487	0.142202	0.006411	0.001713	0.05	9	180	{'learning_rate': 0.05, 'max_depth': 9, 'n_est...	0.888270	0.941464	...	0.933020	0.934491	0.924304	0.920444	0.923470	0.922971	0.931564	0.918574	0.925982	0.005076

	mean_fit_time	std_fit_time	mean_score_time	std_score_time	param_n_estimators	param_max_depth	param_learning_rate	params	split0_test_score	split1_test_score	split2_test_score	split3_test_score	split4_test_score	mean_test_score	std_test_score	rank_test_score
0	1.409287	0.043595	0.013272	0.001676	180	8	0.01	{'n_estimators': 180, 'max_depth': 8, 'learnin...	0.928839	0.930582	0.953096	0.938086	0.934334	0.936987	0.008662	2
1	0.398917	0.004048	0.009777	0.000498	180	2	0.1	{'n_estimators': 180, 'max_depth': 2, 'learnin...	0.850187	0.893058	0.893058	0.874296	0.857411	0.873602	0.017709	5
2	0.969388	0.009220	0.013126	0.000999	180	5	0.1	{'n_estimators': 180, 'max_depth': 5, 'learnin...	0.925094	0.926829	0.953096	0.938086	0.934334	0.935488	0.010011	3
3	0.571424	0.021647	0.008289	0.001412	140	4	0.05	{'n_estimators': 140, 'max_depth': 4, 'learnin...	0.900749	0.926829	0.917448	0.938086	0.923077	0.921238	0.012269	4
4	0.355145	0.048288	0.005679	0.001454	60	7	0.1	{'n_estimators': 60, 'max_depth': 7, 'learning...	0.934457	0.930582	0.947467	0.936210	0.938086	0.937360	0.005628	1